MPI’s Reduction Operations in Clustered Wide Area Systems
نویسندگان
چکیده
The emergence of meta computers and computational grids makes it feasible to run parallel programs on large-scale, geographically distributed computer systems. Writing parallel applications for such systems is a challenging task which may require changes to the communication structure of the applications. MPI’s collective operations (such as broadcast and reduce) allow for some of these changes to be hidden from the applications programmer. We have developed MAGPIE, a library of collective communication operations optimized for wide area systems. MAGPIE’s algorithms are designed to send the minimal amount of data over the slow wide area links, and to only incur a single wide area latency. This paper discusses MPI’s collective reduction operations. Compared to systems that do not take the topology into account, such as MPICH, large performance improvements are possible. For larger messages, best performance is achieved when the reduction function is associative. On moderate cluster sizes, using a wide area latency of 10 millisecond and a bandwidth of 1 MByte/s, operations execute up to 8 times faster than MPICH; application kernels improve by up to a factor of 3. Due to the structure of our algorithms, the advantage increases for higher wide area latencies.
منابع مشابه
Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems
Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks and thus allow parallel programs to run on geographically distributed resources. A major problem in programming such wide-area parallel applications is the difference in communication costs inside and between clusters. Latency and bandwidth of WANs often are orders of magnitude worse than those of local netw...
متن کاملOpen MPI for Cray XE/XK Systems
Open MPI provides an implementation of the MPI standard supporting communication over a range of highperformance network interfaces. Recently, Oak Ridge National Laboratory (ORNL) and Los Alamos National Laboratory (LANL) collaborated on creating a port of Open MPI for Gemini, the network interface for Cray XE and XK systems. In this paper, we present our design and implementation of Open MPI’s...
متن کاملA Distributed File System for a Wide-Area High Performance Computing Infrastructure
We describe our work in implementing a wide-area distributed file system for the NSF TeraGrid. The system, called XUFS, allows private distributed name spaces to be created for transparent access to personal files across over 9000 computer nodes. XUFS builds on many principles from prior distributed file systems research, but extends key design goals to support the workflow of computational sci...
متن کاملA fixed and flexible maintenance operations planning optimization in a parallel batch machines manufacturing system
Scheduling has become an attractive area for artificial intelligence researchers. On other hand, in today's real-world manufacturing systems, the importance of an efficient maintenance schedule program cannot be ignored because it plays an important role in the success of manufacturing facilities. A maintenance program may be considered as the heath care of manufacturing machines and equipments...
متن کاملAn empirical investigation into the relationship between workshop operations and accidents in local automobile garages in Ghana
Local automobile garage workers carry out daily workshop operations, which sometimes lead to accidents and injuries. Therefore, this study was carried out to establish a relationship between automobile workshop operations causing accidents and safety practices among local garage workers in Ghana. Three main data collection approaches were used in the study namely focus group discussions (10 FGD...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998